home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Tools & Utilities
/
Collection of Tools and Utilities.iso
/
tex
/
sspell14.zip
/
README
< prev
next >
Wrap
Text File
|
1992-07-06
|
11KB
|
265 lines
sspell - similar to Unix spell
version 1.4
Author: Maurice Castro
Release Date: 4 Jul 1992
Bug Reports: maurice@bruce.cs.monash.edu.au
This code has been placed by the Author into the Public Domain.
The code is NOT covered by any warranty, the user of the code is
solely responsible for determining the fitness of the program
for their purpose. No liability is accepted by the author for
the direct or indirect losses incurred through the use of this
program.
Segments of this code may be used for any purpose that the user
deems appropriate. It would be polite to acknowledge the source
of the code. If you modify the code and redistribute it please
include a message indicating your changes and how users may
contact you for support.
The author reserves the right to issue the official version of
this program. If you have useful suggestions or changes for the
code, please forward them to the author so that they might be
incorporated into the official version
Please forward bug reports to the author via Internet.
* Introduction
The program SSPELL was written by the author to provide a Unix like
spell checker on a PC. There are several utilities of this type already
available, however, most lacked at least one of the following:
1. Public Domain
2. Source Code
3. Simple, editable word list structure
4. Configurable prefix and suffix list.
5. To use minimal memory
6. To have an unlimited word list length
7. Reasonable speed
8. Portable
The SSPELL program provides all these features. The program currently
compiles under Turbo C++ (Borland) for MS-DOS, DJGCC for MS-DOS, GCC
for Decstations and cc for Unix (OSx for Pyramid, SunOS for Sun 3/50,
Ultrix for Decstation 2100). Minor modification will be required to
compile under other Unix variants.
* Features
The SSPELL program uses a sorted plain ASCII word list for its dictionary.
This makes adding new words to the list easy. Simply add the words and
re-sort the list.
To gain speed, without loading the complete list into memory, a cache
of words recently recovered from the word list is maintained, the disk
is only searched if the word is not found in the cache.
A suffix/prefix list is used to allow a smaller dictionary to be used.
A stop file is provided to permit the exclusion of words. This is typically
used to exclude words that have been incorrectly identified as correct
by applying a rule in the rule list. The stop list is a plain ASCII
word list.
* Operation
Edit the config.h file to set up the required default locations and
compile the code. Place the dictionary in the file specified in the
config.h and make sure that the index file is writable. SSPELL should
now be ready for use.
The SEPARATOR variable should be set to the subdirectory separator for
your system (Unix '/', MS-DOS '\'). The path to the index, dictionary
and rule file is determined by concatenating DICT_PATH with the
separator and the individual file names.
Performance gains may be had by altering the parameters found in the
config.h file. Increasing CACHESIZE increases the memory usage of the
program, but decreases disk search time. IDXSIZ and HASHWID control
the size of the index to the disk file. HASHWID determines the maximum
number of characters compared to determine if an item occurs in a given
slot. IDXSIZ determines the number of slots.
A typical IBM-PC implementation could be written as:
#define DICT_PATH "c:\\utility\\dict"
#define CFGNAME "sspell.cfg"
#define DICTIONARY "main.dct"
#define INDEX "main.idx"
#define STOP "main.stp"
#define RULE "rule.lst"
#define CACHESIZE 1000
#define ROOTNAME "sspell"
#define SORT "c:\\dos\\sort"
#define SEPARATOR "\\"
#define MAXSTR 128
#define SEPSTR " \n\r\t!@#$%^&*(),.<>~`\":;|/\\{}[]"
/* HASHWID must always be 2 or greater */
#define HASHWID 8
#define IDXSIZ 1000
* Environment Variable
A single Environment Variable named SSPELL is consulted by SSPELL.
If the environment variable is not set then the `hardwired' default
(ie. the value found in the `config.h' file) will be used.
The Environment variable specifies a path which is concatenated with a
separator and a file name to locate the configuration, dictionary, index
and rule files.
* Configuration file
If a configuration file (typically named "sspell.cfg") is present in the
default directory or the directory specified by the SSPELL environment
variable, the options contained in the file will override the defaults.
These configuration file options can be overridden by command line
options. Example configuration files are shown below:
# configuration file for SSPELL under MSDOS
DICT_PATH "c:\\utility\\dict"
DICTIONARY "main.dct"
INDEX "main.idx"
RULE "rule.lst"
STOP "main.stp"
SORT "c:\\dos\\sort"
# configuration file for SSPELL under Unix
DICT_PATH "/usr/dict"
DICTIONARY "main.dct"
INDEX "main.idx"
STOP "main.stp"
RULE "rule.lst"
SORT "sort -fu"
* Command Line
SSPELL has the following command line options:
sspell [-u] [-v] [-x] [-c config] [-D dict] [-I index] [-R rule]
[-C cachesize] [-S stop] [file] ...
-c `config' is the pathname of a configuration file.
-u Unsorted. The list of words produced is not sorted and contains
duplicates.
-v all words not actually in the word list are printed and plausible
derivations from the word list are indicated
-x all plausible stems are output
-D `dict' is the pathname of an alternate dictionary
-I `index' is the pathname of an alternate index. This should be
used if using a personalised dictionary or if the index file is
unwriteable.
-R `rule' is the pathname of an alternate rule list
-S `stop' is the pathname of an alternate stop file
-C `cachesize' is the size of the cache of words found in the
dictionary.
SSPELL will take input from a list of files on the command line or from
stdin if no files are supplied.
The dictionary must be in sorted order with the capital letters folded onto
the small letters. (Using Unix sort: sort -fu). The case of words in the
dictionary is significant. Any letter appearing as a capital in the
dictionary must appear as a capital in the text to be regarded as spelled
correctly.
The format of the rule list is fixed. `#' in the first column indicates a
comment. All other lines are of the form:
pre|post <prefix/suffix> <required> <forbidden> <delete>
Any field not used must be filled with a `-'. The following examples
illustrate the features of the rules.
pre un - - -
post ive - e -
post ive e - e
post ied y ay,ey,iy,oy,uy y
The prefix rules are simple, their are no required or forbidden sequences
and nothing to delete. Prefixes must not be more complex.
The suffix rules are more complex. These rule specify the ending to be
added to the root after the deletion of the delete field, provided that
the word has a required ending, provided that the combination is not
forbidden.
Example rule:
post ive - e -
The word 'transitive' is found in the document, the suffix 'ive' is
removed and there is no deleted suffix to replace. The new word
'transit' does not end in the forbidden suffix 'e' and there is
no required ending so a search is made in the dictionary for 'transit'.
The word 'deceive' is found in the document, the suffix 'ive' is